Search CORE

27 research outputs found

Rapid Adaptation of Foreign-accented HMM-based Speech Synthesis

Author: Karhila Reima
Wester Mirjam
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents findings of listeners ’ perception of speaker identity in synthetic speech. Specifically, we investigated what the effect is on the perceived identity of a speaker when using differently accented average voice models and limited amounts (five and fifteen sentences) of a speaker’s data to create the synthetic stimuli. A speaker discrimination task was used to measure speaker identity. Native English listeners were presented with natural and synthetic speech stimuli in English and were asked to decide whether they thought the sentences were spoken by the same person or not. An accent rating task was also carried out to measure the perceived accents of the synthetic speech stimuli. The results show that listeners, for the most part, perform as well at speaker discrimination when the stimuli have been created using five or fifteen adaptation sentences as when using 105 sentences. Furthermore, the accent of the average voice model does not affect listeners ’ speaker discrimination performance even though the accent rating task shows listeners are perceiving different accents in the synthetic stimuli. Listeners do not base their speaker similarity decisions on perceived accent. Index Terms: speech synthesis, rapid adaptation 1

CiteSeerX

Edinburgh Research Explorer

Cross-lingual acoustic model adaptation for speaker-independent speech recognition

Author: Karhila Reima
Publication venue: Aalto-yliopisto
Publication date: 01/01/2010
Field of study

Laadukas puheentunnistus vaatii tunnistussysteemiltä kykyä mukautua puhujan ääneen ja puhetapaan. Suurin osa puheentunnistusjärjestelmistä on rakennettu kielellisesti yhtenäisten ryhmien käyttöön. Kun erilaisista kielellisistä taustoista tulevat ihmiset muodostavat enemmän ja enemmän käyttäjäryhmiä, tarve lisääntyy tehokkaalle monikieliselle puheentunnistukselle, joka ottaa huomioon murteiden ja painotusten lisäksi myös eri kielet. Tässä työssä tutkittiin, miten englannin ja suomen puheen akustisia malleja voidaan yhdistellä ja näin rakentaa monikielinen puheentunnistin. Työssä tutkittiin myös miten puhuja-adaptaatio toimii näissä järjestelmissä kielten sisällä ja kielirajan yli niin, että yhden kielen puhedataa käytetään adaptaatioon toisella kielellä. Puheentunnistimia rakennettiin suurilla suomen- ja englanninkielisillä puhekorpuksilla ja testattiin sekä yksi- että kaksikielisellä aineistolla. Tulosten perusteella voidaan todeta, että englannin ja suomen akustisten mallien yhdistelemisessä turvallisen klusteroinnin raja on niin alhaalla, että yhdistely ei juurikaan kannata tunnistimen tehokkuuden parantamiseksi. Tuloksista nähdään myös, että äidinkielenä puhutun suomen tunnistamista voitiin parantaa käyttämällä vieraana kielenä puhutun englannin dataa. Tämä mekanismi toimi vain yksisuuntaisesti: Vieraana kielenä puhutun englannin tunnistusta ei voinut parantaa äidinkielenä puhutun suomen datan avulla.For good quality speech recognition, the ability of the recognition system to adapt itself to each speaker's voice and speaking style is more than necessary. Most of speech recognition systems are developed for very specific purposes for a linguistically homogenous group. However, as user groups are formed out of people from differing linguistic backgrounds, there is an ever-growing demand for efficient multi-lingual speech technology that takes into account not only varying dialects and accents but also different languages. This thesis investigated how the acoustic models for English and Finnish can be efficiently combined to create a multilingual speech recognition system. Also how these combined systems perform speaker adaptation within languages and across languages using data from one language to improve recognition of the same speaker speaking another language was investigated. Recognition systems were trained based on large Finnish and English corpora, and tested both on monolingual and bilingual material. This study shows that the thresholds for safe merging of the model sets of Finnish and English are so low that the merging can hardly be motivated from the point of view of efficiency. Also it was found out that the recognition of native Finnish can be improved with the use of English speech data from the same speaker. This only works one-way, as the foreign English recognition could not be significantly improved with the help of Finnish speech data

Aaltodoc Publication Archive

Speaker similarity evaluation of foreign-accented speech synthesis using HMM-based speaker adaptation

Author: Karhila Reima
Wester Mirjam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

This paper describes a speaker discrimination experiment in which native English listeners were presented with natural and synthetic speech stimuli in English and were asked to judge whether they thought the sentences were spoken by the same person or not. The natural speech consisted of recordings of Finnish speakers speaking English. The synthetic stimuli were created using adaptation data from the same Finnish speakers. Two average voice models were compared: one trained on Finnish-accented English and the other on American-accented English. The experiments illustrate that listeners perform well at speaker discrimination when the stimuli are both natural or both synthetic, but when the speech types are crossed performance drops significantly. We also found that the type of accent in the average voice model had no effect on the listeners' speaker discrimination performance

CiteSeerX

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer

Transparent pronunciation scoring using articulatorily weighted phoneme edit distance

Author: Karhila Reima
Kurimo Mikko
Smolander Anna-Riikka
Ylinen Sari
Publication venue
Publication date: 01/01/2019
Field of study

Peer reviewe

arXiv.org e-Print Archive

Crossref

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Non-game like training benefits spoken foreign-language processing in children with dyslexia

Author: Junttila Katja
Karhila Reima
Kurimo Mikko
Smolander Anna Riikka
Ylinen Sari
Publication venue
Publication date: 01/01/2023
Field of study

Publisher Copyright: Copyright © 2023 Junttila, Smolander, Karhila, Kurimo and Ylinen.Children with dyslexia often face difficulties in learning foreign languages, which is reflected as weaker neural activation. However, digital language-learning applications could support learning-induced plastic changes in the brain. Here we aimed to investigate whether plastic changes occur in children with dyslexia more readily after targeted training with a digital language-learning game or similar training without game-like elements. We used auditory event-related potentials (ERPs), specifically, the mismatch negativity (MMN), to study learning-induced changes in the brain responses. Participants were 24 school-aged Finnish-speaking children with dyslexia and 24 age-matched typically reading control children. They trained English speech sounds and words with “Say it again, kid!” (SIAK) language-learning game for 5 weeks between ERP measurements. During the game, the players explored game boards and produced English words aloud to score stars as feedback from an automatic speech recognizer. To compare the effectiveness of the training type (game vs. non-game), we embedded in the game some non-game levels stripped of all game-like elements. In the dyslexia group, the non-game training increased the MMN amplitude more than the game training, whereas in the control group the game training increased the MMN response more than the non-game training. In the dyslexia group, the MMN increase with the non-game training correlated with phonological awareness: the children with poorer phonological awareness showed a larger increase in the MMN response. Improved neural processing of foreign speech sounds as indicated by the MMN increase suggests that targeted training with a simple application could alleviate some spoken foreign-language learning difficulties that are related to phonological processing in children with dyslexia.Peer reviewe

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Trepo - Institutional Repository of Tampere University

The Effects of a Digital Articulatory Game on the Ability to Perceive Speech-Sound Contrasts in Another Language

Author: Huotilainen Minna
Kakouros Sofoklis
Karhila Reima
Kurimo Mikko
Lipsanen Jari
Smolander Anna-Riikka
Ylinen Sari
Publication venue
Publication date: 20/05/2021
Field of study

Digital and mobile devices enable easy access to applications for the learning of foreign languages. However, experimental studies on the effectiveness of these applications are scarce. Moreover, it is not understood whether the effects of speech and language training generalize to features that are not trained. To this end, we conducted a four-week intervention that focused on articulatory training and learning of English words in 6-7-year-old Finnish-speaking children who used a digital language-learning game app Pop2talk. An essential part of the app is automatic speech recognition that enables assessing children's utterances and giving instant feedback to the players. The generalization of the effects of such training in English were explored by using discrimination tasks before and after training (or the same period of time in a control group). The stimuli of the discrimination tasks represented phonetic contrasts from two non-trained languages, including Russian sibilant consonants and Mandarin tones. We found some improvement with the Russian sibilant contrast in the gamers but it was not statistically significant. No improvement was observed for the tone contrast for the gaming group. A control group with no training showed no improvement in either contrast. The pattern of results suggests that the game may have improved the perception of non-trained speech sounds in some but not all individuals, yet the effects of motivation and attention span on their performance could not be excluded with the current methods. Children's perceptual skills were linked to their word learning in the control group but not in the gaming group where recurrent exposure enabled learning also for children with poorer perceptual skills. Together, the results demonstrate beneficial effects of learning via a digital application, yet raise a need for further research of individual differences in learning.Peer reviewe

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

User Experiences from L2 Children Using a Speech Learning Application : Implications for Developing Speech Training Applications for Children

Author: Enarvi Seppo
Junttila Katja
Karhila Reima
Kurimo Mikko
Smolander Anna-Riikka
Uther Maria
Ylinen Sari
Publication venue
Publication date: 01/01/2018
Field of study

We investigated user experiences from 117 Finnish children aged between 8 and 12 years in a trial of an English language learning programme that used automatic speech recognition (ASR). We used measures that encompassed both affective reactions and questions tapping into the children' sense of pedagogical utility. We also tested their perception of sound quality and compared reactions of game and nongame-based versions of the application. Results showed that children expressed higher affective ratings for the game compared to nongame version of the application. Children also expressed a preference to play with a friend compared to playing alone or playing within a group. They found that assessment of their speech is useful although they did not necessarily enjoy hearing their own voices. The results are discussed in terms of the implications for user interface (UI) design in speech learning applications for children.Peer reviewe

Directory of Open Access Journals

Winchester Research Repository

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

Wolverhampton Intellectual Repository and E-theses